Goto

Collaborating Authors

 lstm architecture


Entropy is all you need for Inter-Seed Cross-Play in Hanabi

Forkel, Johannes, Foerster, Jakob

arXiv.org Artificial Intelligence

We find that in Hanabi, one of the most complex and popular benchmarks for zero-shot coordination and ad-hoc teamplay, a standard implementation of independent PPO with a slightly higher entropy coefficient 0.05 instead of the typically used 0.01, achieves a new state-of-the-art in cross-play between different seeds, beating by a significant margin all previous specialized algorithms, which were specifically designed for this setting. We provide an intuition for why sufficiently high entropy regularization ensures that different random seed produce joint policies which are mutually compatible. We also empirically find that a high $λ_{\text{GAE}}$ around 0.9, and using RNNs instead of just feed-forward layers in the actor-critic architecture, strongly increase inter-seed cross-play. While these results demonstrate the dramatic effect that hyperparameters can have not just on self-play scores but also on cross-play scores, we show that there are simple Dec-POMDPs though, in which standard policy gradient methods with increased entropy regularization are not able to achieve perfect inter-seed cross-play, thus demonstrating the continuing necessity for new algorithms for zero-shot coordination.


Bayesian Uncertainty Quantification with Anchored Ensembles for Robust EV Power Consumption Prediction

Farhani, Ghazal, Rahman, Taufiq, Humphries, Kieran

arXiv.org Artificial Intelligence

Accurate EV power estimation underpins range prediction and energy management, yet practitioners need both point accuracy and trustworthy uncertainty. We propose an anchored-ensemble Long Short-Term Memory (LSTM) with a Student-t likelihood that jointly captures epistemic (model) and aleatoric (data) uncertainty. Anchoring imposes a Gaussian weight prior (MAP training), yielding posterior-like diversity without test-time sampling, while the t-head provides heavy-tailed robustness and closed-form prediction intervals. Using vehicle-kinematic time series (e.g., speed, motor RPM), our model attains strong accuracy: RMSE 3.36 +/- 1.10, MAE 2.21 +/- 0.89, R-squared = 0.93 +/- 0.02, explained variance 0.93 +/- 0.02, and delivers well-calibrated uncertainty bands with near-nominal coverage. Against competitive baselines (Student-t MC dropout; quantile regression with/without anchoring), our method matches or improves log-scores while producing sharper intervals at the same coverage. Crucially for real-time deployment, inference is a single deterministic pass per ensemble member (or a weight-averaged collapse), eliminating Monte Carlo latency. The result is a compact, theoretically grounded estimator that couples accuracy, calibration, and systems efficiency, enabling reliable range estimation and decision-making for production EV energy management.


LSTM-Based Forecasting and Analysis of EV Charging Demand in a Dense Urban Campus

Ressler, Zak, Grijalva, Marcus, Ignacio, Angelica Marie, Torres, Melanie, Rojas, Abelardo Cuadra, Moghadam, Rohollah, narimani, Mohammad Rasoul

arXiv.org Artificial Intelligence

--This paper presents a framework for processing EV charging load data in order to forecast future load predictions using a Recurrent Neural Network, specifically an LSTM. The framework processes a large set of raw data from multiple locations and transforms it with normalization and feature extraction to train the LSTM. The pre-processing stage corrects for missing or incomplete values by interpolating and normalizing the measurements. This information is then fed into a Long Short-T erm Memory Model designed to capture the short-term fluctuations while also interpreting the long-term trends in the charging data. Experimental results demonstrate the model's ability to accurately predict charging demand across multiple time scales (daily, weekly, and monthly), providing valuable insights for infrastructure planning, energy management, and grid integration of EV charging facilities. The system's modular design allows for adaptation to di fferent charging locations with varying usage patterns, making it applicable across diverse deployment scenarios. I. INTRODUCTION The transition to electric vehicles (EVs) is crucial for mitigating climate change by reducing greenhouse gas emissions and reliance on fossil fuels. However, as EV adoption increases [1], the installation of numerous EV charging stations (EVCS) poses challenges to electric grids, particularly in dense communities. The increased demand for EVCS strains electric grid systems, leading to issues such as voltage drops and transformer overloads. Understanding these problems and their impacts is crucial for optimizing grid performance and ensuring sustainable EV infrastructure development. Therefore, accurately predicting EVCS load demand helps manage grid load, improve power network e fficiency, and ensure reliable customer access to charging stations.


SINAI at eRisk@CLEF 2023: Approaching Early Detection of Gambling with Natural Language Processing

Marmol-Romero, Alba Maria, Plaza-del-Arco, Flor Miriam, Montejo-Raez, Arturo

arXiv.org Artificial Intelligence

This paper describes the participation of the SINAI team in the eRisk@CLEF lab. Specifically, one of the proposed tasks has been addressed: Task 2 on the early detection of signs of pathological gambling. The approach presented in Task 2 is based on pre-trained models from Transformers architecture with comprehensive preprocessing data and data balancing techniques. Moreover, we integrate Long-short Term Memory (LSTM) architecture with automodels from Transformers. In this Task, our team has been ranked in seventh position, with an F1 score of 0.126, out of 49 participant submissions and achieves the highest values in recall metrics and metrics related to early detection.


A Long Short-Term Memory (LSTM) Model for Business Sentiment Analysis Based on Recurrent Neural Network

Razin, Md. Jahidul Islam, Karim, Md. Abdul, Mridha, M. F., Rafiuddin, S M, Alam, Tahira

arXiv.org Artificial Intelligence

Business sentiment analysis (BSA) is one of the significant and popular topics of natural language processing. It is one kind of sentiment analysis techniques for business purposes. Different categories of sentiment analysis techniques like lexicon-based techniques and different types of machine learning algorithms are applied for sentiment analysis on different languages like English, Hindi, Spanish, etc. In this paper, long short-term memory (LSTM) is applied for business sentiment analysis, where a recurrent neural network is used. An LSTM model is used in a modified approach to prevent the vanishing gradient problem rather than applying the conventional recurrent neural network (RNN). To apply the modified RNN model, product review dataset is used. In this experiment, 70\% of the data is trained for the LSTM and the rest 30\% of the data is used for testing. The result of this modified RNN model is compared with other conventional RNN models, and a comparison is made among the results. It is noted that the proposed model performs better than the other conventional RNN models. Here, the proposed model, i.e., the modified RNN model approach has achieved around 91.33\% of accuracy. By applying this model, any business company or e-commerce business site can identify the feedback from their customers about different types of products that customers like or dislike. Based on the customer reviews, a business company or e-commerce platform can evaluate its marketing strategy.


A Statistical Framework for Model Selection in LSTM Networks

Mostafa, Fahad

arXiv.org Machine Learning

Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem of model selection, including hyperparameter tuning, architecture specification, and regularization choice remains largely heuristic and computationally expensive. In this paper, we propose a unified statistical framework for systematic model selection in LSTM networks. Our framework extends classical model selection ideas, such as information criteria and shrinkage estimation, to sequential neural networks. We define penalized likelihoods adapted to temporal structures, propose a generalized threshold approach for hidden state dynamics, and provide efficient estimation strategies using variational Bayes and approximate marginal likelihood methods. Several biomedical data centric examples demonstrate the flexibility and improved performance of the proposed framework.


Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment Analysis

Gu, Wenjun, Zhong, Yihao, Li, Shizun, Wei, Changsong, Dong, Liting, Wang, Zhuoyue, Yan, Chao

arXiv.org Artificial Intelligence

The stock market's ascent typically mirrors the flourishing state of the economy, whereas its decline is often an indicator of an economic downturn. Therefore, for a long time, significant correlation elements for predicting trends in financial stock markets have been widely discussed, and people are becoming increasingly interested in the task of financial text mining. The inherent instability of stock prices makes them acutely responsive to fluctuations within the financial markets. In this article, we use deep learning networks, based on the history of stock prices and articles of financial, business, technical news that introduce market information to predict stock prices. We illustrate the enhancement of predictive precision by integrating weighted news categories into the forecasting model. We developed a pre-trained NLP model known as FinBERT, designed to discern the sentiments within financial texts. Subsequently, we advanced this model by incorporating the sophisticated Long Short Term Memory (LSTM) architecture, thus constructing the innovative FinBERT-LSTM model. This model utilizes news categories related to the stock market structure hierarchy, namely market, industry, and stock related news categories, combined with the stock market's stock price situation in the previous week for prediction. We selected NASDAQ-100 index stock data and trained the model on Benzinga news articles, and utilized Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Accuracy as the key metrics for the assessment and comparative analysis of the model's performance. The results indicate that FinBERT-LSTM performs the best, followed by LSTM, and DNN model ranks third in terms of effectiveness.


Off-the-Shelf Neural Network Architectures for Forex Time Series Prediction come at a Cost

Zafeiriou, Theodoros, Kalles, Dimitris

arXiv.org Artificial Intelligence

Our study focuses on comparing the performance and resource requirements between different Long Short-Term Memory (LSTM) neural network architectures and an ANN specialized architecture for forex market prediction. We analyze the execution time of the models as well as the resources consumed, such as memory and computational power. Our aim is to demonstrate that the specialized architecture not only achieves better results in forex market prediction but also executes using fewer resources and in a shorter time frame compared to LSTM architectures. This comparative analysis will provide significant insights into the suitability of these two types of architectures for time series prediction in the forex market environment.


Comparative analysis of neural network architectures for short-term FOREX forecasting

Zafeiriou, Theodoros, Kalles, Dimitris

arXiv.org Artificial Intelligence

The present document delineates the analysis, design, implementation, and benchmarking of various neural network architectures within a short-term frequency prediction system for the foreign exchange market (FOREX). Our aim is to simulate the judgment of the human expert (technical analyst) using a system that responds promptly to changes in market conditions, thus enabling the optimization of short-term trading strategies. We designed and implemented a series of LSTM neural network architectures which are taken as input the exchange rate values and generate the short-term market trend forecasting signal and an ANN custom architecture based on technical analysis indicator simulators We performed a comparative analysis of the results and came to useful conclusions regarding the suitability of each architecture and the cost in terms of time and computational power to implement them. The ANN custom architecture produces better prediction quality with higher sensitivity using fewer resources and spending less time than LSTM architectures. The ANN custom architecture appears to be ideal for use in low-power computing systems and for use cases that need fast decisions with the least possible computational cost.


Evaluating Driver Readiness in Conditionally Automated Vehicles from Eye-Tracking Data and Head Pose

Kazemi, Mostafa, Rezaei, Mahdi, Azarmi, Mohsen

arXiv.org Artificial Intelligence

As automated driving technology advances, the role of the driver to resume control of the vehicle in conditionally automated vehicles becomes increasingly critical. In the SAE Level 3 or partly automated vehicles, the driver needs to be available and ready to intervene when necessary. This makes it essential to evaluate their readiness accurately. This article presents a comprehensive analysis of driver readiness assessment by combining head pose features and eye-tracking data. The study explores the effectiveness of predictive models in evaluating driver readiness, addressing the challenges of dataset limitations and limited ground truth labels. Machine learning techniques, including LSTM architectures, are utilised to model driver readiness based on the Spatio-temporal status of the driver's head pose and eye gaze. The experiments in this article revealed that a Bidirectional LSTM architecture, combining both feature sets, achieves a mean absolute error of 0.363 on the DMD dataset, demonstrating superior performance in assessing driver readiness. The modular architecture of the proposed model also allows the integration of additional driver-specific features, such as steering wheel activity, enhancing its adaptability and real-world applicability.